Skip to content

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)#1533

Open
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/sp8192-fused-banking-muon97-5seed
Open

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)#1533
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/sp8192-fused-banking-muon97-5seed

Conversation

@aryanbhosale
Copy link
Copy Markdown
Contributor

Record: SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT

val_bpb = 1.0790 (5-seed mean, std 0.0003) | ~15.99 MB | 8×H100 SXM

5-Seed Results

Seed TTT BPB val_loss (nats)
42 1.0788 2.7866
314 1.0789 2.7868
1337 1.0788 2.7867
7 1.0793 2.7880
999 1.0795 2.7884
Mean 1.0790 2.7873

Merged SOTA (PR #1493): 1.0810 BPB. Delta: −0.0020 BPB / −0.0047 nats.

Stack

PR #1523 base (@abaybektursun) with hash embedding removed and standard MLP (no Triton fused kernel):

  1. SP8192 + GPTQ embeddings + SDClip
  2. Parameter Banking — batched Newton-Schulz
  3. Triple Depth Recurrence (L3-5, 17 virtual layers)
  4. Parallel Residuals (L7+)
  5. Muon 0.97 (PR Record: SP8192 + Muon 0.97 + Legal Score-First TTT — val_bpb 1.07983 (3-seed mean) #1514 @dexhunter)
  6. QK-Gain 5.25, EMA 0.9965, WD 0.095, warmdown 0.72
  7. Score-First TTT (3 epochs, SGD lr=0.005)

Compliance (Track B)

Score-first TTT (PR #461). No SLOT, no hash embed, no pre-quant TTT, no n-gram, no ETLB. All conditions from Issue #1017 satisfied. All artifacts < 16MB.

Credits

PR #1523 @abaybektursun, PR #1394 @clarkkev, PR #1514 @dexhunter, PR #1493 @bigbag, PR #1204 @msisovic

…uon 0.97 + TTT — val_bpb 1.0790 (5-seed mean)
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — SP8192 + Banking + Triple Recurrence + Parallel Residuals + Muon 0.97 + TTT

Thanks @aryanbhosale — this is the same general stack family as your other submissions today (#1540 LoRA TTT + varlen) and the #1493 lineage. Clean compliance read below.

What I found (head SHA 9304879bb01246a0f8afa11e9f54e13f7e8f246b, records/track_10min_16mb/2026-04-11_SP8192_Banking_ParResid_TripleRecur_Muon97_TTT/train_gpt.py, decoded from the import lzma as L,base64 as B self-extracting shim — 58,957 bytes / 622 lines of actual source):

CPU smoke test (CT2038 proteus-engine, 2026-04-11):

IMPORT_OK               seconds=6.00
HP_NUM_LAYERS           11
HP_MODEL_DIM            512
HP_VOCAB_SIZE           8192
HP_QK_GAIN_INIT         5.0
HP_CODE_BYTES           19760 (shim; decoded 58,957 B)
SMOKE_TEST_PASS

Compliance summary:

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE. Incremental improvement on the SP8192 + parallel-residuals + triple-recurrence lineage, clean eval path, no compliance flags. The architectural delta vs #1493 / #1533 / #1541 is small enough that the 3-seed std should be disclosed in the PR body for the statistical claim; on the static review nothing blocks landing.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_OK 6.00s, SMOKE_TEST_PASS. Decoded source statically reviewed for the standard compliance axes — no flags. Full forward-pass / artifact gauntlet skipped (heavy architecture, CPU-bound past the budget). AI tooling: review drafted with Claude Code (Opus); batch-9 subagent quota exhausted so this review was authored in the main session. SHA 9304879bb01246a0f8afa11e9f54e13f7e8f246b.

This was referenced Apr 11, 2026
This was referenced Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants